Ordinary Differential Equation Methods for Markov Decision Processes and Application to Kullback-Leibler Control Cost

نویسندگان

  • Ana Busic
  • Sean P. Meyn
چکیده

A new approach to computation of optimal policies for MDP (Markov decision process) models is introduced. The main idea is to solve not one, but an entire family of MDPs, parameterized by a weighting factor ζ that appears in the one-step reward function. For an MDP with d states, the family of value functions {hζ : ζ ∈ R} is the solution to an ODE, d dζh ∗ ζ = V(hζ) where the vector field V : R → R has a simple form, based on a matrix inverse. This general methodology is applied to a family of average-cost optimal control models in which the one-step reward function is defined by Kullback-Leibler divergence. The motivation for this reward function in prior work is computation: The solution to the MDP can be expressed in terms of the Perron-Frobenius eigenvector for an associated positive matrix. The drawback with this approach is that no hard constraints on the control are permitted. It is shown here that it is possible to extend this framework to model randomness from nature that cannot be modified by the controller. Perron-Frobenius theory is no longer applicable – the resulting dynamic programming equations appear as complex as a completely unstructured MDP model. Despite this apparent complexity, it is shown that this class of MDPs admits a solution via this new ODE technique. This approach is new and practical even for the simpler problem in which randomness from nature is absent.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

KL-learning: Online solution of Kullback-Leibler control problems

We introduce a stochastic approximation method for the solution of an ergodic Kullback-Leibler control problem. A Kullback-Leibler control problem is a Markov decision process on a finite state space in which the control cost is proportional to a Kullback-Leibler divergence of the controlled transition probabilities with respect to the uncontrolled transition probabilities. The algorithm discus...

متن کامل

Comparison of Kullback-Leibler, Hellinger and LINEX with Quadratic Loss Function in Bayesian Dynamic Linear Models: Forecasting of Real Price of Oil

In this paper we intend to examine the application of Kullback-Leibler, Hellinger and LINEX loss function in Dynamic Linear Model using the real price of oil for 106 years of data from 1913 to 2018 concerning the asymmetric problem in filtering and forecasting. We use DLM form of the basic Hoteling Model under Quadratic loss function, Kullback-Leibler, Hellinger and LINEX trying to address the ...

متن کامل

Fast Communication Gaussian Approximations of Small Noise Diffusions in Kullback–leibler Divergence∗

We study Gaussian approximations to the distribution of a diffusion. The approximations are easy to compute: they are defined by two simple ordinary differential equations for the mean and the covariance. Time correlations can also be computed via solution of a linear stochastic differential equation. We show, using the Kullback–Leibler divergence, that the approximations are accurate in the sm...

متن کامل

Gaussian Approximations of Small Noise Diffusions in Kullback-leibler Divergence

Abstract. We study Gaussian approximations to the distribution of a diffusion. The approximations are easy to compute: they are defined by two simple ordinary differential equations for the mean and the covariance. Time correlations can also be computed via solution of a linear stochastic differential equation. We show, using the Kullback-Leibler divergence, that the approximations are accurate...

متن کامل

Extended Geometric Processes: Semiparametric Estimation and Application to ReliabilityImperfect repair, Markov renewal equation, replacement policy

Lam (2007) introduces a generalization of renewal processes named Geometric processes, where inter-arrival times are independent and identically distributed up to a multiplicative scale parameter, in a geometric fashion. We here envision a more general scaling, not necessar- ily geometric. The corresponding counting process is named Extended Geometric Process (EGP). Semiparametric estimates are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • SIAM J. Control and Optimization

دوره 56  شماره 

صفحات  -

تاریخ انتشار 2018